ICCS Summer school 2023
Generally speaking, most neural networks are fit/trained using SGD (or some variant of it).
To understand the basics of how one might fit a function with SGD, let’s do it with a straight line: \[y=mx+c\]
Question—when we a differentiate a function, what do we get?
Consider:
\[y = mx + c\]
\[\frac{dy}{dx} = m\]
\[y = x\]
\[\frac{dy}{dx} = 1\]
\[-\frac{dy}{dx}\]
To fit a function, we essentially want to create a model which describes data.
We therefore need a way of measuring how a model’s predictions deviate from our observations.
| \(x_{i}\) | \(y_{i}\) |
|---|---|
| 1.0 | 2.1 |
| 2.0 | 3.9 |
| 3.0 | 6.2 |
We can measure the distance between \(f(x_{i})\) and \(y_{i}\).
Normally we might consider the mean-squared error:
\[L_{\text{MSE}} = \frac{1}{n}\sum_{i=1}^{n}\left(y_{i} - f(x_{i})\right)^{2}\]
We can differentiate the loss function w.r.t. to each parameter in the the model \(f\).
We can use the direction of steepest descent to iteratively ‘nudge’ the parameters in a direction which reduces the loss.
Model: \(f(x) = mx + c\)
Data: \(\{x_{i}, y_{i}\}\)
Loss: \(\frac{1}{n}\sum_{i=1}^{n}(y_{i} - x_{i})^{2}\)
\[L_{\text{MSE}} = \frac{1}{n}\sum_{i=1}^{n}(y_{i} - f(x_{i}))^{2}\\ = \frac{1}{n}\sum_{i=1}^{n}(y_{i} - mx_{i} + c)^{2} \]
\[m_{n + 1} = -m_{n}\frac{dL}{dm} \cdot l_{r}\]
\[c_{n + 1} = -c_{n}\frac{dL}{dm} \cdot l_{r}\]
To fit a model we need:
A model.
Some data.
A loss function
An optimisation procedure (often SGD and other flavours of SGD).
All in all, ’tis quite simple.
Neural networks are just functions.
We can ‘’train’’, or fit, them as we would any other function:
With neural networks, differentiating the loss function is a bit more complicated.
We won’t go through any more maths on the matter—learning resources on the topic are in no short supply.
\[a_{l+1} = \sigma \left( W_{l}a_{l} + b_{l} \right)\]
Fully-connected neural networks are often applied to tabular data.
Normally people use neural networks for one of two things:
Classification: assigning a semantic label to something—i.e. is this a dog or cat?
Regression: Estimating a continuous quantity such as mass or volume.
In this workship-lecture-thing, we will implement some straightforward neural networks in PyTorch, and use them for different classification and regression problems.
PyTorch is a deep learning framework that can be used in both Python and C++.
See the PyTorch website: https://pytorch.org/
In this exercise, you will train a fully-connected neural network to classify the species of penguins based on certain physical features.
Thanks to Jack Atkinson for suggesting this dataset.
In this exercise, you will train a fully-connected neural network to predict the mass of penguins based on other physical features.
Thanks (again) to Jack Atkinson for suggesting this dataset.
Look at the torch.nn.Conv1d docs
torch.nn.Conv2d docs. Image source: https://medium.com/techiepedia/binary-image-classifier-cnn-using-tensorflow-a3f5d6746697
torch.nn.AdaptiveAvgPool2dtorch.nn.AdaptiveMaxPool2dtorchvision.models.In this exercise we’ll train a CNN to classify hand-written digits in the MNSIT dataset.
See the MNIST database wiki for more details.
In this exercise, we’ll train a CNN to estimate the centre \((x_{\text{c}}, y_{\text{c}})\) and the \(x\) and \(y\) radii of an ellipse defined by \[ \frac{(x - x_{\text{c}})^{2}}{r_{x}^{2}} + \frac{(y - y_{\text{c}})^{2}}{r_{y}^{2}} = 1 \]
The ellipse, and the background, will have random colours chosen uniformly on \(\left[0,\ 255\right]^{3}\).
In short, the model must learn to estimate \(x_{\text{c}}\), \(y_{\text{c}}\), \(r_{x}\) and \(r_{y}\).